Run your own LLM
NexaAI: Hub for On-Device AI
Featuring a Local File Organizer that can automatically rename and organize local files based on their content.
NexaSDK: comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.”
Oolama
Eric Hartford describes step-by-step how to run locally an Oolama-based set of LLM models for conversation and chat. Sizes start at 3 GB with RAM at 6 GB+.
Import your own OOlama models from GGUF, PyTorch or Safetensors and add to the Oolama repository with these instructions.
LM Studio
Run a model locally with LM Studio With LM Studio, you can …
🤖 - Run LLMs on your laptop, entirely offline 👾 - Use models through the in-app Chat UI or an OpenAI compatible local serve r📂 - Download any compatible model files from HuggingFace 🤗 repositories
But I find it’s pretty slow…on my 8GB M2 Mac.
GPT4All
Open source models from Nomic let you run Large Language Models Locally
I tried running this on my 8MB M2 Mac with a very small model from Phi-2-GGUF). It runs quickly and without overly burdening the rest of my system.
Here’s an example of it writing code: